Picture for Yi Wu

Yi Wu

Deep Research as Rubric for Reinforcement Learning

Add code
May 31, 2026
Viaarxiv icon

Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation

Add code
May 26, 2026
Viaarxiv icon

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

Add code
May 26, 2026
Viaarxiv icon

Verifiable Process Rewards for Agentic Reasoning

Add code
May 11, 2026
Viaarxiv icon

Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents

Add code
May 07, 2026
Viaarxiv icon

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Add code
Apr 04, 2026
Viaarxiv icon

ExVerus: Verus Proof Repair via Counterexample Reasoning

Add code
Mar 26, 2026
Viaarxiv icon

AU Codes, Language, and Synthesis: Translating Anatomy to Text for Facial Behavior Synthesis

Add code
Mar 19, 2026
Viaarxiv icon

Aligning Large Language Models with Searcher Preferences

Add code
Mar 11, 2026
Viaarxiv icon

ECG-MoE: Mixture-of-Expert Electrocardiogram Foundation Model

Add code
Mar 04, 2026
Viaarxiv icon